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(54) Method and system for tracking resource allocation within a processor 



(57) A method and system are disclosed for tracking 
the allocation of resources within a processor having 
multiple execution units which support speculative exe- 
cution of instructions. The processor includes a re- 
source counter including a first counter and a second 
counter and a number of resources, wherein one or 
more of the resources are allocated to each of a number 
of instructions dispatched for execution to the execution 
units. In response to dispatching an instruction among 
the plurality of instructions to one of the execution units 
for execution, the first counter is incremented once for 
each of the resources allocated to the instruction, and if 
the instruction is a first instruction within a speculative 
execution path, the second counter is loaded with a val- 
ue of the first counter prior to incrementing the first coun- 
ter. In response to completion of a particular instruction 
among the number of instructions dispatched to one of 
the multiple execution units, the first and the second 
counters are decremented once for each resource allo- 
cated to the particular instruction. In response to a ref- 
utation of the speculative execution path, a value of the 
second counter is transferred to the first counter, such 
that the resource counter tracks a number of the plurality 
of resources allocated to the plurality of instructions. 
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Des ription 

BACKGROUND OF THE INVENTION 

1. Technical Field: 

The present invention relates in general to an im- 
proved method and system for data processing, and in 
particular to an improved method and system tor track- 
ing the allocation of resources within processor that sup- 
ports speculative execution of instructions. Still more 
particularly, the present invention relates to a method 
and system for tracking the allocation of resources with- 
in a speculatively executing processor which enable the 
processor to recover the state of resource allocation fol- 
lowing a mispredicted branch. 

2. Description of the Related Art: 

Designers of state-of-the-art processors are contin- 
ually attempting to improve performance of such proc- 
essors. Recently, processor designers have developed 
a number of architectural enhancements that have sig- 
nificantly improved processor performance over proc- 
essors utilizing conventional architectures. For exam- 
ple, Reduced Instruction Set Computer (RISC) proces- 
sors utilize reduced instruction sets that enable such 
processors to achieve low cycle-per-instruction (CPI) 
ratios. To further increase throughput, processors can 
also employ a superscaler architecture that enables 
multiple instructions to be issued and executed simulta- 
neously by a number of execution units. As a further en- 
hancement, execution units within a superscaler proc- 
essor can be designed to execute in a pipelined fashion 
in which each execution unit processes multiple instruc- 
tions simultaneously with one or more instructions at 
each stage of execution. Finally, state-of-the-art proc- 
essors are equipped to execute instructions in an order 
determined by the availability of execution units rather 
than by sequential programmed order. This so-called 
"out of order" execution enables a processor to maxi- 
mize the utilization of available execution unit resources 
during each cycle. 

In a typical pipelined superscaler processor that 
supports out- of-order processing, one or more instruc- 
tions are dispatched each cycle to a number of execu- 
tion units. The instructions are executed opportunistical- 
ly as execution unit resources become available with the 
caveat that the execution units must adhere to data de- 
pendencies between instructions. That is, if the execu- 
tion of a first instruction depends upon data resulting 
from the execution of a second instruction/the first in- 
struction must be executed prior to the s cond instruc- 
tion. After an execution unit has completed processing 
an instruction, the instruction is forwarded to one of the 
number of completion buffers within the superscaler 
processor. A completion (rename) buffer is a temporary 
buffer which holds an instruction until the instruction is 



completed by transferring the data associated with the 
instruction from t mporary registers to architected reg- 
isters within the processor. 

Although instructions can execute in any order as 
long as data dependencies are observed, most proces- 
sors require that instructions are completed (i.e., data 
committed to architected) registers) in program order. 
One reason for the requirement of in-order completion 
is to enable the processor to support precise interrupt 
and exception handling. For example, when an excep- 
tion such as divide-by-zero arithmetic error occurs, an 
exception handler software routine will be invoked to 
manage the interrupt or exception. However, before the 
exception handler can be invoked, instructions preced- 
ing the instruction which generated the exception have 
to be completed in program order for the exception han- 
dler to execute in an environment that emulates the en- 
vironment which would exist had the instructions been 
executed in program order. A second reason for the re- 
quirement of in-order completion is to enable proper re- 
covery of a prior context if a branch is guessed wrong. 
As will be appreciated by those skilled in the art, super- 
scaler processors typically include a branch execution 
unit, which predicts the result of branch instructions. 
Since the result of a branch instruction is guessed and 
instructions following the branch instruction reentry 
point are executed speculatively, the processor must 
have a mechanism for recovering a prior processor con- 
text jf the branch is later determined to have been 
guessed wrong. Consequently, speculatively executed 
instructions cannot be completed until branch instruc- 
tions preceding the speculatively executed instructions 
in program order have been completed. 

In order to complete instructions executed out-of- 
order in program order, the processor must be equipped 
with facilities which track the program order of instruc- 
tions during out-of-order execution. In conventional su- 
perscaler processors which support out-of-order execu- 
tion, the program order of instructions is tracked by each 
of the execution units. However, as the number of exe- 
cution units and the number of instructions which may 
be executed out-of-order increase, tracking the program 
order of instructions burdens the performance of the ex- 
ecution units. Consequently, it would be desirable to 
provide an improved method and system for managing 
the instruction flow within a superscaler processor which 
enables instructions to be dispatched in-order, executed 
out-of-order, and completed in-order and which does not 
require that the execution units track the program order 
so of instructions. 

A second source of performance problems within 
processors which support speculative execution of in- 
structions is the recovery of the stat of processor re- 
sources following a mispredict d branch. Typically, 
ss processors which support speculative execution of in- 
structions include a branch history table (BHT) that n- 
ables a processor to predict the outcome of branch in- 
structions based upon prior branch outcomes. Thus, uti- 
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lizing data within the BHT, the proc ssor will begin exe- 
cution of one or more sequential sp culative execution 
paths which follow branch instruction reentry points. In 
conventional processors which support speculative ex- 
ecution, once a branch is determined to be guessed 
wrong, the processor stalls the execution pipeline until 
alt sequential instructions preceding the misguessed 
branch are completed. Once all valid data is committed 
from the rename buffers to architected registers, all of 
the rename buffers are flushed and reset. Thereafter, 
the processor continues execution and allocation of the 
rename buffers beginning with the sequential instruction 
following the alternative execution path. Although this 
recovery mechanism guarantees that all of the proces- 
sor resources will be available following a mispredicted 
branch, the conventional recovery mechanism de- 
grades processor performance since the processor 
must delay dispatching additional instructions and allo- 
cating rename buffer resources until all instructions pre- 
ceding the misguessed branch are completed. 

Consequently, it would be desirable to provide an 
improved method and apparatus within a processor 
which enable the processor to restore the correct state 
of processor resources once a speculative execution 
path is determined to be mispredicted. 

SUMMARY OF THE INVENTION 

It is therefore one object of the present invention to 
provide an improved method and system for data 
processing. 

It is another object of the present invention to pro- 
vide an improved method and system for tracking the 
allocation of resources within a processor that supports 
speculative execution of instructions. 

It is yet another object of the present invention to 
provide an improved method and system for tracking the 
allocation of resources within a speculatively executing 
processor which enable the processor to recover the 
state of resource allocation following a mispredicted 
branch. 

The foregoing objects are achieved as is now de- 
scribed. A method and system are disclosed for tracking 
the allocation of resources within a processor having 
multiple execution units which support speculative exe- 
cution of instructions. The processor includes a re- 
source counter including a first counter and a second 
counter and a number of resources, wherein one or 
more of the resources are allocated to each of a number 
of instructions dispatched for execution to the execution 
units. In response to dispatching an instruction among 
the plurality of instructions to one of the execution units 
for execution, the first counter is incremented once for 
each of th resources allocated to th instruction, and if 
the instruction is a first instruction within a speculative 
ex cution path, the second counter is loaded with a val- 
ue of the first counter prior to incrementing the first coun- 
ter. In response to completion of a particular instruction 



among the number of instructions dispatched to one of 
the multiple execution units, the first and the second 
counters are decremented once for each resource allo- 
cated to the particular instruction. In response to a ref- 

s utation of the speculative execution path, a value of the 
second counter is transferred to the first counter, such 
that the resource counter tracks a numberof the plurality 
of resources allocated to the plurality of instructions. 
The above as well as additional objectives, fea- 

10 tures, and advantages of the present invention will be- 
come apparent in the following detailed written descrip- 
tion. 

BRIEF DESCRIPTION OF THE DRAWINGS 

is 

The novel features believed characteristic of the in- 
vention are set forth in the appended claims. The inven- 
tion itself, however, as well as a preferred mode of use, 
further objectives and advantages thereof, will best b 
20 understood by reference to the following detailed de- 
scription of an illustrative embodiment when read in con- 
junction with the accompanying drawings, wherein: 

Figure 1 illustrates a preferred embodiment of a da- 
2B ta processing system which utilizes the method and 
system of the present invention; 

Figure 2 depicts a block diagram of the system unit 
of the data processing system illustrated in Figure 1 ; 

30 

Figure 3 illustrates a block diagram of a preferred 
embodiment of a processor which employs the 
method and system of the present invention; 

35 Figure 4 depicts a more detailed block diagram of 
the instruction sequencing table (1ST) illustrated in 
Figure 3; 

Figure 5 illustrates a preferred embodiment of a 
40 counter which indicates a number of allocated en- 
tries within the instruction sequencing table depict- 
ed in Figure 4; 

Figure 6 depicts a preferred embodiment of a coun- 
ts ter which indicates a number of allocated floating- 
point rename buffers; 

Figure 7 illustrates a preferred embodiment of a 
counter which indicates a number of allocated gen- 
so eral purpose rename buffers; 

Figure 8 depicts a flowchart of the operation of th 
instruction sequencing table during a dispatch cy- 
cle; 

55 

Figure 9 illustrat s a flowchart of the operation of 
the instruction sequencing table during a finish cy- 
cle; and 
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Figure 1 0 d picts a flowchart of the operation of the 
instruction sequencing table during a completion 
cycle. 

DETAILED DESCRIPTION OF PREFERRED 
EMBODIMENT 



With reference now to the figures and in particular 
with reference to Figure 1, there is illustrated a block 
diagram of data processing system which employs the 
method and system of the present invention. As illus- 
trated, data processing system 10 comprises system 
unit 12 and one or more local nodes 14, which include 
personal computer 16, display 18, keyboard 20 and 
mouse 22. As is well-known to those skilled in the art a 
user inputs data to personal computer 16 utilizing key- 
board 20, mouse 22, or other suitable input device The 
user may then process the data locally utilizing personal 
computer 16, or transmit the data from personal com- 
puter 16 to system unit 12 or another node 14 utilizing 
well-known networking techniques. It is advantageous 
for a user to send tasks to system unit 12 for execution 
since system unit 12 can execute tasks in a relatively 
short period of time compared to node 14. System unit 
12 and personal computer 16 output data to a user via 
display device 18. 

Referring now to Figure 2, there is depicted a block 
diagram of system unit 12, which in a preferred embod- 
iment of the present invention comprises a symmetric 
multiprocessor computer, such as the IBM RISC Sys- 
tem/6000. System unit 12 includes one or more CPUs 
30, which each include an on-board level one (L1 ) cache 
32. Each CPU 30 is also associated with a level two (12) 
cache 34. As will be understood by those skilled in the 
art, L1 caches 32 and L2 caches 34 each comprise a 
small amount of high-speed memory which store fre- 
quently accessed segments of data and instructions If 
data requested by a CPU 30 is not resident within the 
cache 32 or L2 cache 34 associated with CPU 30 
the requested data is retrieved from main memory 36 
via system bus 38. 

System unit 1 2 also includes SCSI controller 40 and 
bus interface 46. SCSI controller 40 enables a user to 
attach additional SCSI devices 42 to system unit 12 via 
peripheral bus 44. Bus interface 46 provides facilities 
that enable multiple local nodes 14 to access system 
resources available within system unit 12. As will be ap- 
preciated by those skilled in the art, system unit 12 in- 
cludes additional hardware coupled to system bus 46 
that is not necessary for an understanding of the present 
invention and is accordingly omitted for simplicity. 

With reference now to Figure 3, there is illustrated 
a preferred embodiment of a CPU 30 in accordance with 
the method and syst m of the present invention. In the 
preferred embodiment depicted in Figur 3, CPU 30 
comprises a superscaler processor that issues multiple 
instructions into multiple execution pipelines each cycle, 
thereby enabling multiple instructions to be executed si- 



multaneously. CPU 30 has five execution units 60-68 
including fixed-point units 60 and 62, load-store unit 64 
floating-point unit 66, and logical condition register unit 
68. 

5 According to the present invention, CPU 30 also in- 
cludes instruction sequencing table (1ST) 80, which en- 
ables CPU 30 to track trie execution of instructions by 
execution units 60^68 and to complete instructions in 
program order. Referring now to Figure 4, there is de- 
10 p,cted a block diagram of a preferred embodiment of 1ST 
80. As illustrated, 1ST 80 includes a number of entries 
11 0, which each contain a finish bit 1 1 2, exception code 
field 114, general purpose register (GPR) field 116 float- 
ing-point (FPR) register field 118, and branch bit 120 
is Entnes 110 are addressed by one of 16 instruction IDs 
which are each associated with an outstanding instruc- 
tion, that is, an instruction that has been dispatched but 
not completed. 

With reference now to Figure 8, there is illustrated 
a flowchart of the operation of 1ST 80 during a dispatch 
cycle. As the process begins at block 200, instruction 
fetch address register (I FAR) 52 calculates the address 
of the next instructions to be fetched from instruction 
cache 64 based upon information received from pro- 
25 gram counter 104. The group of instructions specified 
by the address generated by I FAR 52 is loaded in par- 
allel into instruction buffer 56 and dispatch unit 58 from 
instruction cache 54. The process then proceeds to 
block 202, which depicts determining a number of avail- 
*> able entries 110 within 1ST 80. In a preferred embodi- 
ment of the present invention, the number of available 
entries 110 within 1ST 80 is easily determined from an 
1ST entry counter 1 30 (illustrated in Figure 5) within re- 
source counters 98 that counts the number of allocated 
3S 1ST entries 1 10. In the preferred embodiment illustrated 
m Figure 4, up to three instructions can be dispatched 
during each cycle if sufficient entries 110 are available 
within 1ST 80. 

Next, at block 204, instruction buffer 56 reads out 
40 , n program order a set of instructions for which 1ST en- 
tries 110 are available. Utilizing resource availability in- 
formation received from completion unit 88 and re- 
source counters 98, dispatch unit 58 enables selected 
ones of execution units 60-68 to begin execution of in- 
^ structions for which resources, such as rename buffers 
90 and 92, are available. Each instruction dispatched 
from instruction buffer 56 is assigned one of the instruc- 
tion IDs specified by dispatch pointers 82. Since instruc- 
tions are dispatched in program order, entries within 1ST 
o 80 are allocated in program order. Thus, for the state of 
1ST 80 depicted in Figure 4, if only a single instruction 
were dispatched during a dispatch cycle, that instruction 
would be assigned the entry 1 1 0 associated with instruc- 
tion ID -1101 ■ and specified as dispatch instruction ID 1 
55 by dispatch point rs 82. 

The process then proceeds to block 206, which il- 
lustrates writing completion information into 1ST 80 for 
each instruction dispatched. Each instruction issued 
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from dispatch buffer 56 is proc ssed by instruction de- 
code unit (IDU) 70. IDU 70 decodes each instruction to 
determine the register resources required to complete 
the instruction. Thus, by determining the type of each 
instruction, IDU 70 can determine the number of general 
purpose registers (GPRs) and floating-point registers 
(FPRs) required to store the data associated with the 
instruction. Once IDU 70 has determined the register re- 
sources required to execute an instruction, IDU 70 
writes the information into the appropriate entry 110 
within 1ST 80. Next, the process proceeds to block 208, 
which depicts determining which, if any, of the dis- 
patched instructions are speculative. If a dispatched in- 
struction is the first instruction within a speculative exe- 
cution path, the process proceeds to block 208, which 
depicts storing the dispatch pointer 82 (i.e., instruction 
ID) pointing to the entry allocated to the speculative in- 
struction as a backup pointer 84. Storing the instruction 
ID of the first instruction within each speculative execu- 
tion path enables CPU 30 to recover the correct execu- 
tion context if a branch is later determined to have been 
misguessed. 

The process proceeds from either block 208 or 
block 210 to block 212, which illustrates updating 1ST 
entry counter 130 and dispatch pointers 82. 1ST entry 
counter 130 is updated by 1ST control 100 which incre- 
ments or decrements I ST entry counter 1 30 by the net 
number of entries 110 allocated during the cycle after 
taking into account both dispatched and completed in- 
structions. Dispatch pointers 82 are updated by incre- 
menting the instruction ID to which dispatch pointers 82 
point by the number of instructions dispatched during 
the cycle. Utilizing rotating pointers rather than a shifting 
queue enhances the performance of 1ST 80 since only 
dispatch pointers 82 are updated each cycle rather than 
even/entry 110. Thereafter, the process proceeds to 
block 214 where the process terminates. 

Referring now to Figure 9, there is depicted a flow- 
chart of the operation of 1ST 80 during a finish cycle. As 
is well known to those skilled in the art, each of execution 
units 60-68 is an execution pipeline having multiple 
stages, such as fetch, decode, execute, and finish, 
which can accommodate one or more instructions at 
each stage. Because execution units 60-68 operate in- 
dependently and because the number of cycles required 
to execute instructions can vary due to data dependen- 
cies, branch resolutions, and other factors, execution 
units 60-68 execute instructions out of program order. 
As illustrated, the process begins at block 230 and 
thereafter proceeds to block 232, which depicts 1ST 80 
receiving an instruction ID and finish report from execu- 
tion units 60-68 for each instruction finished during the 
cycle. The finish report includes an exception code 
which identifies the xception generated by execution 
of the instruction, if any. The process then proceeds to 
block 234, which illustrates 1ST 80 writing th xception 
code received at block 232 into the exception code fi Id 
114 of the entry 110 identified by the finished instruc- 



tion's ID. In addition, at block 234, finish bit 112 within 
entry 110 is set to indicate that the instruction has fin- 
ished execution. In a preferred embodiment of the 
present invention, up to six finish reports can be written 
s to 1ST 80 during a finish cycle. Following block 234, the 
process terminates at block 236. 

With reference now to Figure 10, there is depicted 
a flowchart of the operation of 1ST 80 during a comple- 
tion cycle. As illustrated, the process begins at block 240 
io and thereafter proceeds to block 242, which depicts 
completion unit 88 reading out instructions from 1ST 80 
that are indicated by completion pointers 86. As depict- 
ed in Figure 4, a preferred embodiment of the present 
invention maintains three completion pointers 86 that 
is specify instructions which can potentially be completed 
within a given processor cycle. The process then pro- 
ceeds from block 242 to block 244, which illustrates 
completion unit 88 determining which of the instructions 
read out at block 242 generated exceptions that have 
20 not been handled. Completion unit 88 determines if an 
instruction generated an exception by examining the ex- 
ception code field 114 associated with each instruction. 
If the first instruction (i.e. the instruction whose associ- 
ated entry 110 is specified as completion instruction ID 
25 1 by one of completion pointers 86) generated an ex- 
ception, the process proceeds from block 244 through 
block 246 to block 248, which depicts forwarding the first 
instruction to interrupt handling unit 102. As will be un- 
derstood by those skilled in the art, interrupt handling 
30 unit 102 calls an exception handling vector associated 
with the exception type specified by the exception code 
written within exception code field 114. Thereafter, the 
process proceeds from block 248 to block 254. 

Returning to block 244, if the first instruction read 
ss out from 1ST 80 did not generate an exception, the proc- 
ess proceeds from block 244 through block 246 to block 
249, which depicts determining which of the instructions 
read out at block 242 can be completed during the cur- 
rent cycle. In order to support precise interrupts, several 
40 constraints are placed upon the completion of instruc- 
tions. First, only instructions that are marked as finished 
within 1ST 80 by finish bit 112 can be completed. Sec- 
ond, instructions that generated an exception which has 
not been handled cannot be completed in the present 
45 completion cycle. Third, an instruction can be completed 
only if all instructions preceding the instruction in pro- 
gram order have already been completed or will becom- 
pleted during the current completion cycle. Finally, for 
an instruction to be completed, the requisite number of 
50 general purpose registers and floating-point registers 
must be available within general purpose register file 94 
and floating-point register file 96. Following block 249 
the process proce ds to block 250, which depicts com- 
pletion unit 88 completing instructions which satisfy th 
ss foregoing conditions by writing data associated with the 
instructions from GPR and FPR rename buffers 90 and 
92 to GPR and FPR files 94 and 96. 

Thereafter, the process proceeds from block 250 to 
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block252. which depicts 1ST control 100 freeing 1ST en- 
tries 110 that are associated with the instructions com- 
pleted at block 250. 1ST control 100 frees 1ST entries 
110 by incrementing each of completion pointers 86 
once for each instruction completed. Thereafter, the 5 
process proceeds to block 254 where the process ter- 
minates 

Referring now to Figures 5-7, there are illustrated 
block diagrams of 1ST entry counter 130, FPR rename 
buffer counter 150, and GPR rename buffer counter io 
170, which together comprise resource counters 98. 
With reference first to Figure 5, 1ST entry counter 130 
includes multiplexers 132-137 and counters 138-142. 
According to a preferred embodiment of the present in- 
vention, counter 138 comprises a 17-bit shift counter 15 
which indicates in decoded format how many of the 16 
1ST entries 110 are currently allocated to outstanding 
instructions Counter 1 38 is said to be in decoded format 
since the position of a set bit (a binary "1") within the 
counter indicates the number of allocated entries 110. 20 
For example, when 1ST 80 is empty, only the least sig- 
nificant (left most) bit is set, indicating that 0 entries 110 
are allocated: if 1ST 90 is full . only the most significant 
bit is set. By storing the counters in decoded format rath- 
er than utilizing a register which is incremented and dec- 25 
remented by adders, the present invention not only min- 
imizes the cycle time utilized to update counter 138 : but 
also minimizes the complexity of CPU 30 and the chip 
substrate area consumed. 

During each cycle 1ST control 100 computes the net 30 
change in the number of allocated entries 110 from the 
number of instructions dispatched and completed dur- 
ing that cycle. In a preferred embodiment of the present 
invention, the net change in the number of allocated en- 
tries 100 varies between +3 during cycles in which 3 in- 35 
structions are dispatched and 0 instructions are com- 
pleted to -3 during cycles in which 3 instructions are 
completed and 0 instructions are dispatched. 1ST con- 
trol 100 updates counter 138 to reflect the current 
number of allocated entries 1 1 0 by selecting the appro- 40 
priate update input to multiplexer 132, which in turn 
shifts the set bit within counter 138 a corresponding 
number of bit positions. Because an entry 110 is re- 
quired for each instruction dispatched, counter 138 pro- 
vides an interlock that prevents dispatch unit 58 from 45 
dispatching more instructions than can be accommodat- 
ed within entries 110 in 1ST 80. 

1ST entry counter 130 also includes backup buffer 
counter A 140 and backup buffer counter B 142, which 
comprise shift counters like counter 1 38. Backup buffer 50 
counter A 140 indicates a number of allocated 1ST en- 
tries 110 excluding instructions within a first speculative 
execution path. Similarly, backup buffer counter B&I.42 
indicat s the number of allocated 1ST entries 110 ex- 
cluding instructions within a second speculative execu- 55 
tion path. As will be appreciated by thos skill d in the 
art, embodiments of the present invention which support 
more than two speculative execution paths include one 



additional backup buffer counter for each additional 
speculative execution path permitted. 

When the first instruction within a speculative exe- 
cution path is dispatched, 1ST control 100 enables the 
select input to mux 1 33 to load the value of counter 1 38, 
which indicates the number of 1ST entries 110 allocated 
prior to dispatching instructions during the current cycle, 
into backup buffer counter A 1 40. In addition 1ST control 
100 selects the appropriate update input to mux 134 to 
update backup buffer counter A 1 40. For example, if the 
second and third instructions dispatched are specula- 
tive and 3 outstanding instructions are completed during 
the current cycle, 1ST control 100 selects the -2 update 
input. Asillustrated, counter 140 can be incremented by 
a maximum of two entries since speculative instructions 
account for at least one of the three instructions which 
can be dispatched during the current cycle. During cy- 
cles while speculative execution path A remains unre- 
solved, 1ST control logic 100 selects the appropriate 
path A input of mux 1 34 to update backup buffer counter 
A 140 to reflect the reduction in allocated entries 110 
due to completion of outstanding nonspeculatrve in- 
structions. If speculative execution path A is resolved as 
guessed correctly, the contents of backup buffer counter 
A 140 are simply ignored. If, however, speculative exe- 
cution path A is resolved as guessed wrong, 1ST control 
1 00 enables the select input to mux 1 37 to load the value 
of backup buffer counter A 140 into counter 138. In ad- 
dition, 1ST control 100 selects the appropriate path A 
input to mux 1 32 to account for instructions completed 
during the current cycle. Thus, 1ST entry counter 138 
maintains a correct count of allocated entries 110 even 
in cases where branches are misguessed. 

As will be appreciated by those skilled in the art, 
mux 1 36 and backup buffer counter B 1 42 operate sim- 
ilarly to mux 134 and backup buffer counter A 140 to 
allow recovery from a second speculative execution 
path taken prior to the resolution of speculative path A. 
If speculative path A is resolved as correctly predicted 
and speculative path B (the second speculative execu- 
tion path) is resolved as mispredicted, 1ST control 100 
selects the appropriate input to mux 1 37 to load the val- 
ue of backup buffer counter B 142 into counter 138. In 
addition, 1ST control 100 updates counter 1 38 by select- 
ing the appropriate path B input to mux 132 to account 
for instructions completed during the current cycle. 

Referring now to Figure 6, there is depicted a block 
diagram of FPR rename buffer counter 150, which indi- 
cates the number of allocated FPR rename buffers 92. 
As is evident by inspection of Figure 6, FPR rename 
buffer counter 1 50 functions much like 1ST entry counter 
130. Backup buffer counter A 160 and backup buffer 
counter B 162 maintain a correct count of the number 
of allocated FPR renam buffers 92 in cases wh re ei- 
th rof two branch instructions are mispredicted, ther by 
enabling FPR renam buffer counter 150 to r store the 
correct FPR buffer count to counter 1 58 in a singl cycle. 
In the illustrated embodiment, up to 3 of FPR rename 
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buffers 92 can be assign d to instructions and up to 3 
of FPR rename buffers 92 can be written to FPR file 96 
during each cycle. 

With reference now to Figure 7, there is illustrated 
a preferred embodiment of GPR rename buffer counter 
170, which counts the number of GPR rename buffers 
90 assigned to outstanding instructions. As will be ap- 
preciated by those skilled in the art, GPR rename buffer 
counter 170 operates similarly to FPR rename buffer 
counter 150, except for a difference in the number of 
GPR rename buffers 90 which can be allocated and re- 
tired within a cycle. In the depicted embodiment, up to 
two of GPR rename buffers 90 can be assigned to each 
instruction upon dispatch since two GPR rename buff- 
ers 90 are required to execute a "load and update" in- 
struction. However, only two of GPR rename buffers 90 
can be written to GPR file 94 during a given completion 
cycle. 

The design of FPR and GPR rename buffer 
counters 150 and 170 enhances Ihe performance of the 
present invention as compared to prior art systems 
since resources allocated to mispredicted branches can 
be more quickly reallocated. Prior art processors which 
support speculative execution of instructions typically 
do not include facilities such as backup buffer counters 
A and B to enable the processors to recover the correct 
state of processor resources following a misguessed 
branch. In conventional processors which support spec- 
ulative execution, once a branch is determined to be 
guessed wrong, the processor stalls the execution pipe- 
line until all sequential instructions preceding the mis- 
guessed branch are completed. Once all valid data is 
committed from the rename buffers to architected reg- 
isters, all of the rename buffers are flushed and reset. 
Thereafter, the processor continues execution and allo- 
cation of the rename buffers beginning with the sequen- 
tial instruction following the alternative execution path. 
Although this mechanism is relatively efficient in terms 
of the circuitry required to recover from a misguessed 
branch, the recovery mechanism degrades processor 
performance since the processor must delay dispatch- 
ing additional instructions and llocating rename buffer 
resources until all instructions preceding the mis- 
guessed branch are completed. 

As has been described, the present invention pro- 
vides an improved method and system tor managing the 
flow of instructions through a superscaler processor 
which supports out-of-order execution. By maintaining 
an entry corresponding to each outstanding instruction 
within an instruction sequencing table, the present in- 
vention enables instructions executed out-of-program 
order by multiple execution units to be completed in or- 
der, thereby supporting precise interrupts. Furthermore, 
the pr sent invention provid s an efficient mechanism 
for recovering from misgu ssed branches which ena- 
bl s th recovery of both the program state and th re- 
source state of the processor prior to the misguessed 
branch. Although a proc ssor which employs the 



pr sent invention has been described with r ference to 
various limitations with respect to a number of instruc- 
tions which can be dispatch d, finished, and completed 
during a given processor cycle, those skilled in the art 

s will appreciate that these limitations are merely design 
choices and do not serve limitations on the present in- 
vention, i 

While the invention has been particularly shown 
and described with reference to a preferred embodi- 

io ment, it will be understood by those skilled in the art that 
various changes in form and detail may be made therein 
without departing from the spirit and scope of the inven- 
tion. 

75 

Claims 

1. A method for tracking the allocation of resources 
within a processor which supports speculative exe- 

20 cution of instructions, said processor having a plu- 
rality of execution units, a resource counter includ- 
ing a first counter and a second counter, and a plu- 
rality of resources, wherein one or more of said plu- 
rality of resources are allocated to each of a plurality 

2S of instructions dispatched for execution to said plu- 
rality of execution units, said method comprising: 

in response to dispatching an instruction 
among said plurality of instructions to one of 
30 said plurality of execution units for execution: 

incrementing said first counter once for each of 
said plurality of resources allocated to said in- 
struction; 

35 

if said instruction is a first instruction within a 
speculative execution path, toading said sec- 
ond counter with a value of said first counter 
prior to incrementing said first counter; 

40 

in response to completion of a particular in- 
struction among said plurality of instructions 
dispatched to one of said plurality of execution 
units, decrementing said first and said second 
45 counters once for each resource allocated to 

said particular instruction; and 

in response to a refutation of said speculativ 
execution path, transferring a value of said sec- 
so ond counter to said first counter wherein said 

resource counter tracks a number of said plu- 
rality of resources allocated to said plurality of 
instructions. 

55 2. The method for tracking th a I location of resources 
within a processor of Claim 1 , wher in said proces- 
sor comprises a superscalar processor capable of 
dispatching and completing multiple instructions 
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during each cycle, wherein said step of loading said 
second counter with a value of said first counter fur- 
ther comprises: 

incrementing said second counter once for each of 
said plurality of resources allocated to nonspecula- s 
five instructions among said plurality of instructions 
that are dispatched concurrently with an instruction 
which is a first instruction within a speculative exe- 
cution path. 

10 

The method for tracking the allocation of resources 
within a processor of Claim 1 , wherein said proces- 
sor supports a second speculative execution path 
and said resource counter further includes a third 
counter, said method further comprising: is 

in response to dispatching a selected instruc- 
tion among said plurality of instructions to one 
of said plurality of execution units for execution, 
wherein said selected instruction is a first in- 20 
struction within a second speculative execution 
path, loading said third counter with a value of 
said first counter prior to incrementing said first 
counter; 

25 

in response to completion of a particular in- 
struction among said plurality of instructions 
dispatched to one of said plurality of execution 
units, decrementing said third counter once for 
each resource allocated to said particular in- 30 
struction; and 

in response to resolution of a first speculative 
execution path as correctly predicted and refu- 
tation of said second speculative execution 3S 
path, transferring a value of said third counter 
to said first counter, wherein said resource 
counter tracks a number of said plurality of re- 
sources allocated to said plurality of instruc- 
tions. 40 

The method for tracking the allocation of resources 
within a processor of Claim 1 , said first and said sec- 
ond counters comprising first and second shift reg- 
isters, respectively, wherein each of said first and 45 
said second shift registers indicates a number of al- 
located resources among said plurality of resources 
by a bit postition of a set bit within said first and said 
second shift registers, wherein said step of incre- 
menting said first counter comprises shifting said so 
set bit in a first direction within said first shift register 
one bit position lor each of said plurality of resourc- 
es allocated to said instruction, and wherein said 
st p of decrementing said first and said second 
counters comprises shifting said set bits within said ss 
first and said s cond shift registers in a s cond di- 
rection one bit position for each resource allocated 
to said particular instruction. 



5. An apparatus for tracking th allocation of resourc- 
es within a processor which supports speculative 
execution of instructions, said processor having a 
plurality of execution units and a plurality of resourc- 
es, wherein one or more of said plurality of resourc- 
es are allocated to each of a plurality of instructions 
dispatched for execution to said plurality of execu- 
tion units, said apparatus comprising: 

a resource counter having a first counter and a 
second counter; 

means for incrementing said first counter once 
for each of said plurality of resources allocated 
to said instruction in response to dispatching an 
instruction among said plurality of instructions 
to one of said plurality of execution units for ex- 
ecution; 

means lor loading said second counter with a 
value of said first counter prior to incrementing 
said first counter in response to dispatching a 
particular instruction among said plurality of in- 
structions to one of said plurality of execution 
units for execution, wherein said particular in- 
struction is a first instruction within a specula- 
tive execution path; 

means for decrementing said first and said sec- 
ond counters once for each resource allocated 
to an instruction among said plurality of instruc- 
tions dispatched to said plurality of execution 
units in response to completion of a said in- 
struction; and 

means for transferring a value of said second 
counter to said first counter in response to ref- 
utation of said speculative execution path, 
wherein said resource counter tracks a number 
of said plurality of resources allocated to said 
plurality of instructions. 

6. The apparatus for tracking the allocation of resourc- 
es within a processor of Claim 5, wherein said proc- 
essor comprises a superscalar processor capable 
of dispatching and completing multiple instructions 
during each cycle, wherein said means for loading 
said second counter with a value of said first counter 
further comprises: 

means for incrementing said second counter once 
for each of said plurality of resources allocated to 
nonspeculative instructions among said plurality of 
instructions that are dispatched concurrently with 
an instruction which is a first instruction within a 
speculativ execution path. 

7. The apparatus for tracking the allocation of resourc- 
es within a processor of Claim 5, wherein said proc- 
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essor supports a second speculative execution 
path and said resource counter further includes a 
third counter, said apparatus further comprising: 

means for loading said third counter with a val- s 
ue of said first counter prior to incrementing 
said first counter in response to dispatching a 
selected instruction among said plurality of in- 
structions to one of said plurality of execution 
units for execution, wherein said selected in- 10 
struction is a first instruction within a second 
speculative execution path; 

means for decrementing said third counter 
once tor each resource allocated to a particular is 
instruction among said plurality of instructions 
dispatched to said plurality of execution units in 
response to completion of said particular in- 
struction; and 

20 

means for transferring a value of said third 
counter to said first counter in response to res- 
olution of a first speculative execution path as 
correctly predicted and refutation of said sec- 
ond speculative execution path, wherein said 2S 
resource counter tracks a number of said plu- 
rality of resources allocated to said plurality of 
instructions; and 

said first and said second counters comprising 30 
first and second shift registers, respectively, 
wherein each of said first and said second shift 
registers indicates a number of allocated re- 
sources among said plurality of resources by a 
bit postition of a set bit within said first and said 35 
second shift registers, wherein said means for 
incrementing said first counter comprises 
means for shifting said set bit in a first direction 
within said first shift register one bit position for 
each of said plurality of resources allocated to *o 
said instruction, and wherein said means for 
decrementing said first and said second 
counters comprises means for shifting said set 
bits within said first and said. second shift reg- 
isters in a second direction one bit position for 4 & 
each resource allocated to said particular in- 
struction. 

8. The apparatus for tracking the allocation of resourc- 
es within a processor of Claim 5, wherein said plu- 50 
rality of resources comprise a plurality of rename 
data buffers utilized to store data associated with 
said plurality of instructions prior to completion; and 
wherein said processor supports out-of-order exe- 
cution of said plurality of instructions and includes ss 
an instruction sequencing tabl having a plurality of 
entries, wherein each of said plurality of instructions 
is assigned one of said plurality of ntri s sequen- 



tially according to a program order of said plurality 
of instructions, such that said plurality of instruc- 
tions can be completed according to said program 
order, wherein said plurality of resources comprise 
said plurality of entries within said instruction se- 
quencing table. 

i 

9. A superscalar processor, comprising: 

a plurality of execution units, wherein instruc- 
tions dispatched to said plurality of execution 
units can be executed out of program order; 

a plurality of user-accessible data registers; 

a plurality of rename buffers; 

means for dispatching instructions to said plu- 
rality of execution units; 

means for assigning an instruction identifier to 
each of a plurality of instructions dispatched to 
said plurality of execution units for execution, 
wherein an instruction identifier is assigned to 
each of said plurality of instructions sequential- 
ly according to a program order of said plurality 
of instructions; 

a table having a plurality of entries, wherein 
each entry among said plurality of entries is as- 
sociated with an instruction identifier and con- 
tains a finish indicator that indicates whether 
execution of an instruction assigned an instruc- 
tion identifier associated with said each entry 
has finished; 

means for setting a finish indicator within a par- 
ticular entry among said plurality of entries with- 
in said table in response to termination of exe- 
cution of an instruction assigned to an instruc- 
tion identifier associated with said particular en- 
try; 

one or more pointers which point to entries with- 
in said table associated with instruction identi- 
fiers assigned to a subset of said plurality of in- 
structions that can possibly be completed dur- 
ing a particular processor cycle, wherein a se- 
lected instruction among said subset is com- 
pleted by transferring data associated with said 
selected instruction from associated ones of 
said plurality of rename buffers to selected 
ones of said plurality of data registers; and 

means for compl ting select d instructions 
within said subset of said plurality of instruc- 
tions, wherein exceptions generated by said 
selected instructions have been handled, 
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wherein instructions among said plurality of in- isters; 
structions which are assigned instruction iden- 
tifiers preceding said selected instructions 
have been completed during a previous proc- 
essor cycle or will be completed during the s 
same processor cycle, and wherein instruction 
identifiers assigned to said selected instruc- 
tions are associated with entries having set fin- 
ish indicators, such that said plurality of instruc- 
tions are completed according to said program 10 
order. 

10. The superscalar processor of Claim 9, wherein 
each entry within said table further comprises: 

75 

a field specifying a number of said plurality of 
data registers required to complete an instruc- 
tion to which an instruction identifier associated 
with said each entry is assigned; and 

a field indicating exception conditions which oc- 
curred during execution of an instruction to 
which an instruction identifier associated with 
said each entry is assigned; 

11. The superscalar processor of Claim 9, said super- 
scalar processor having M rename buffers and said 
further comprising: 

a rename buffer counter, including: 30 

a primary shift register having M+1 bits, where- 
in said primary shift register indicates a first 
number of said plurality of rename buffers allo- 
cated to instructions which are dispatched and 3S 
uncompleted by a position of a set bit within 
said primary shift register; 

a backup shift register having M+1 bits, said 
backup shift register being associated with a 40 
speculative execution path, wherein said back- 
up shift register indicates a second number of 
said plurality of rename buffers allocated to in- 
structions which are dispatched and uncom- 
pleted and are not within said speculative exe- 45 
cution path, said second number being indicat- 
ed by a position of a set bit within said backup 
shift register; and 

means for transferring said second number so 
from said backup shift register to said primary 
shift register in response to a determination that 
said speculative execution path was mispre- 
dict d; and 



55 



wherein said superscalar processor supports N 
speculative execution paths, said data buffer 
counter furth r comprising N backup shift reg- 



10 




11 



EP 0 751 458 A1 




12 



EP 0 751 458 A1 



Instruction Fetch 
Address Register 



52 



Instruction 
Cache 




\ 


f 


Instri 
But 


iction 
fer 



5 



54 



Completion 
Pointers 




3 



LCRU 



IDU 



68 



•7 0 



Instruction 
Sequencing 
Table 



88 



'2J 



Dispatch 
Pointers 



5i 



84 



Backup 
Pointers 



80 



1ST 
Control 



Counters 



100 



Completion 
Unit 



102 



Interrupt 
Handling Unit 



•98 



104 



5! 



90 



GPR 
Rename 
Buffers 



Program 
Counter 



Fig. 3 



I 



5 



92 



FPR 
Rename 
Buffers 



General 
Purpose 
Register 
File 



7 



I 



Floating- 
Point 
Register 
File 



94 



•9 6 



13 



EP 0 751 458 A1 



Finish Reports 
from Execution Units 



5" 



110 




1 1 2 C 1 1 4 r116 r 118 r 120 



Instruction 
10 



0000 



0001 



0010 



S3 



Exception 
Code 



G PR 



FPR 



s 1 



Completion 
\y Instruction ID3 



^ 86 



Completion 
Instruction ID1 

Completion 
Instruction ID2 



e e e 



1101 



1110 



1111 



T" In 



Dispatch 
Instruction ID1 

Dispatch 
Instruction ID2 

Dispatch 
Instruction ID3 



Fig. 4 



14 



EP 0 751 458 A1 




15 



EP 0 751 458 A1 




o 



UJ 



16 



EP 0 751 458 A1 




o 

LU 
-J 
UJ 



17 



EP 0 751 458 A1 



— \ C 1 

Begin 

r202 



200 



Determin 
of ava 
1ST e 


e number 
liable 
ntries 




c;204 



' Read out group of 
Instructions from 
instruction buffer 

Dispatch 
instructions for 
which resources 
are available 

Assign dispatched 
instructions an 
instruction ID 



5i 



206 



Write completion 

information of 
eech instruction 
dispatched 



Fig. 8 



Speculative 
instructions 
dispatched ? 




Yes _ 210 



2 



Store pointer to 
first instruction 

within each 
speculative path 



•212 



Update 1ST entry 

counter and 
dispatch pointers 



c 



End 



214 



C Begin \S 
r-232 



230 



Receiv 
report 
executn 


b finish 
s from 
an units 




C234 


Write finish reports 
to indicated 
1ST entries 



I 



End 



Fig. 9 



236 



18 



EP 0 751 458 A1 



Q Begin ^ lS 



240 



Read out 
instructions to be 
retired from 1ST 



244 



Determine which if 
any of instructions 
generated exception 



Fig. 10 




1st 
instruction 
generated 
exception ? 

|No 
L 



Yes 



249 



Determine which 
instructions can 
be completed 



250 



Write results 
associated with 
instructions to be 
completed from 
rename buffers 
to GPR/FPR 
register flies 



248 



Handle exception 







Free IS1 
of com 
instru 


entries 
pleted 
ctions 



End 



254 



19 



EP 0 751 458 A1 



faropean Patent 
Office 



EUROPEAN SEARCH REPORT 



Application Number 

EP 96 48 9077 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Cat. 

Y 



Citation of document wkh indication, where appropriate, 
of relevant passages 



Relevant 

to 



CLASSIFICATION OF THE 
APPLICATION (I0LCL6) 



PROCEEDINGS, SUPERCOMPUTING '93, 

15 November 1993, PORTLAND, OREGON, US, 
pages 636-644, XPOOO437401 
J. K. PICKETT ET AL: "Enhanced 
Superscalar Hardware: The Schedule Table" 

* the whole document * 

IEEE TRANSACTIONS ON COMPUTERS, 

vol. 37, no. 5. May 1988. NEW YORK. NY, 

pages 562-573, XP000047779 

J. E. SMITH ET AL: n Implementing Precise 

Interrupts in Pipelined Processors" 

* Sections IV-VI; figures 3-6 * 

EP-A-0 312 239 (NORTHERN TELECOM LIMITED) 

* the whole document * 

US-A-3 699 479 (R. THOMPSON ET AL) 

* column 2, line 4 - line 42; figures 1,4 



EP-A-0 677 808 (MOTOROLA, INC.) 

* the whole document * 

PROCEEDINGS OF THE 26TH ANNUAL 
INTERNATIONAL SYMPOSIUM ON 
MICROARCHITECTURE, 

1 - 3 December 1993, AUSTIN, TX, US, 
pages 282-213, XPO00447502 
M. MOUDGILL ET AL: "Register Renaming and 
Dynamic Speculation: an Alternative 
Approach" 

* the whole document * 



The present search report has been drawn up for all claims 



1-11 



G06F9/38 



1-11 



1-11 



4,7,11 



1-11 



1-11 



TECHNICAL FIELDS 
SEARCHED (l«t»CL») 



G06F 



BERLIN 



D*e af na^leUM at thr tesn* 

9 October 1996 



CATECOJRY OF CITED DOCUMENTS 

X : particularly relevant if taken 
V : particularly relevant if combine 
document of the same category 
A : technological background 
O : non-written disclosure 
P : intermediate document 



Abram, R 



T : theory or principle underlying the invention 
E ; earlier patent document, but published on, or 

after the filing date 
D : document cited in the application 
L : document cited for other reasons 



& : member of the same patent family, coreponding" 
document 



20 



